Hidden Conditional Neural Fields for Continuous Phoneme Speech Recognition

نویسندگان

  • Yasuhisa Fujii
  • Kazumasa Yamamoto
  • Seiichi Nakagawa
چکیده

In this paper, we propose Hidden Conditional Neural Fields (HCNF) for continuous phoneme speech recognition, which are a combination of Hidden Conditional Random Fields (HCRF) and a MultiLayer Perceptron (MLP), and inherit their merits, namely, the discriminative property for sequences from HCRF and the ability to extract non-linear features from an MLP. HCNF can incorporate many types of features from which non-linear features can be extracted, and is trained by sequential criteria. We first present the formulation of HCNF and then examine three methods to further improve automatic speech recognition using HCNF, which is an objective function that explicitly considers training errors, provides a hierarchical tandem-style feature and includes a deep non-linear feature extractor for the observation function. We show that HCNF can be trained realistically without any initial model and outperforms HCRF and the triphone hidden Markov model trained by the minimum phone error (MPE) manner using experimental results for continuous English phoneme recognition on the TIMIT core test set and Japanese phoneme recognition on the IPA 100 test set. key words: hidden conditional neural fields, hidden conditional random fields, hidden Markov model, speech recognition, deep learning

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep-hidden Conditional Neural Fields for Continuous Phoneme Speech Recognition

We have proposed Hidden Conditional Neural Fields (HCNF) for automatic speech recognition and shown the effectiveness by continuous phoneme recognition experiments on the TIMIT and the Japanese ASJ+JNAS corpora. In this paper, we propose to use an observation function with a deep structure in HCNF. The proposed deep observation function enables to use the deep neural networks in HCNF, which hav...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks

In hybrid hidden Markov model/artificial neural networks (HMM/ANN) automatic speech recognition (ASR) system, the phoneme class conditional probabilities are estimated by first extracting acoustic features from the speech signal based on prior knowledge such as, speech perception or/and speech production knowledge, and, then modeling the acoustic features with an ANN. Recent advances in machine...

متن کامل

بهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگی‌های استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز

The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...

متن کامل

Improving LVCSR with hidden conditional random fields for grapheme-to-phoneme conversion

In virtually every state-of-the-art large vocabulary continuous speech recognition (LVCSR) system, grapheme-to-phoneme (G2P) conversion is applied to generalize beyond a fixed set of words given by a background lexicon. The overall performance of the G2P system has a strong effect on the recognition quality. Typically, generative models based on joint-n-grams are used, although some discriminat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEICE Transactions

دوره 95-D  شماره 

صفحات  -

تاریخ انتشار 2012